Goals

This vignette will teach you all about TFs and motiv enrichment analyses for your scATAC data! For example, you will perform DNA sequence motif analysis in Signac. For this, you will explore two complementary options for performing motif analysis: one by finding overrepresented motifs in a set of differentially accessible peaks, one method performing differential motif activity analysis between groups of cells.

References

This vignette is a modified version of this vignette.

Load libraries

As before, we first load all necessary R packages that are needed for this vignette.

suppressPackageStartupMessages({
library(Signac)
library(Seurat)
library(tidyverse)
library(patchwork)
library(chromVAR)
library(TFBSTools)
library(universalmotif)
library(motifmatchr)
library(BSgenome.Mmusculus.UCSC.mm10)
library(BiocParallel)
library(ggseqlogo)
})
set.seed(1990)

Load data

We begin by loading our pre-processed Seurat object from the introductory QC vignette into R. This code is shared among all subsequent vignettes.

# Specify the path to the "outs" folder from cellranger-atac here 
# Make sure to have a trailing slash here
outFolder="/mnt/data/cellranger/outs/"
seu.s = readRDS(file = paste0(outFolder,"obj.filt.rds"))

# Make sure we have the ATAC array in the object
seu.s[["ATAC"]]
## ChromatinAssay data with 158583 features for 3401 cells
## Variable features: 119325 
## Genome: mm10 
## Annotation present: TRUE 
## Motifs present: FALSE 
## Fragment files: 1
# In case the object contains multiple assays, make the ATAC assay the standard
DefaultAssay(seu.s) = "ATAC"

Using a TF motif database and importing it into Seurat

For motif analyses, a TF motif database is needed. Here, users have many choices such as JASPAR, HOCOMOCO, CIS-BP, TRANSFAC, SwissRegulon, or UniProbe.

The HOCOMOCO database

HOCOMOCO is one of the major resources for sequence analysis of transcriptional regulation in mammals and a popular choice for DNA binding specificities for human and mouse transcription factors, and version 11 contains binding models for 453 mouse and 680 human transcription factors and includes 1302 mononucleotide and 576 dinucleotide position weight matrices, which describe primary binding preferences of each transcription factor and reliable alternative binding specificities.

In order to show you the power and flexibility of R as well as because we have good eperience with HOCOMOCO, we decided to choose this database for this vignette.

Generally speaking, importing the HOCOMOCO database into R would be easy if a designated HOCOMOCO package for R already existed (such as for JASPAR database), but unfortunately, this is not the case to the best of our knowledge. But with the power and flexibility of R, with a few lines of code, we can download the full HOCOMOCO database and all PWM motifs and import it into R without having to go to the terminal at all! Welcome to the world of bioinformatics! By not manually doing these tasks, we also eliminate many possible sources of error, and make this step as reproducible as possible.

Preparation

As noted above, the following part may seem a bit custom. We have a few tasks ahead of us: downloading, pre-processing, converting and overlapping motif locations with our peaks, and finally adding the motif annotation to our Seurat object. Let’s do it step for step.

We will first download the whole HOCOMOCO PWM database (which includes many files), and we then convert the downloaded PWM matrices to proper motif objects in R. PWM stands for Position Weight Matrix and describes the probability to find the respective nucleotides A,C,G,T on each position of a motif. Thus, because motifs have different lengths, each TF PWM file has a different number of rows (one for each motif site) but always 4 columns (A, C, G, T).

# prepare motifs HOCOMOCO
file_motifs_rds = paste0(outFolder,"HOCO11MOUSE.pfmlist.rds")

if (file.exists(file_motifs_rds)) {
  motifsProcessed <- readRDS(file_motifs_rds)
} else {
  
  file_url =  "https://hocomoco11.autosome.ru/final_bundle/hocomoco11/full/MOUSE/mono/HOCOMOCOv11_full_pwm_MOUSE_mono.tar.gz"
  file_downloadLocation =  paste0(outFolder, "HOCOMOCOv11_full_pwm_MOUSE_mono.tar.gz")
  
  download.file(file_url, destfile= file_downloadLocation)
  
  # First, just check contents without extracting
  head(untar(file_downloadLocation,list = TRUE))
  
  # Time to extract the files
  untar(file_downloadLocation, exdir = outFolder, verbose = TRUE)
 
  # for file in pwm/*pwm; do echo "" >> "$file"; done
  pathToPWMs = paste0(outFolder, "/pwm/")
  
  # We can now automatically save all file names into a vector and process it in R directly
  pwmfiles = list.files(pathToPWMs, full.names = TRUE)

  motif.list <- vector("list", length(pwmfiles))

  i = 1 # Counter
  for (fileCurrent in pwmfiles) {
    
    # We now read in the PWM file and create a universalmotif object out of it
    # We here suppress warnings that are not relevant for us for improved output
    motivCur = suppressWarnings(universalmotif::read_matrix(fileCurrent, headers = ">", sep = "", positions = "rows"))
    
    # We now have to convert the current class universalmotif to class "TFBSTools-PFMatrix
  # We here suppress notes that are not relevant for us for improved output
    motivConverted = suppressMessages(convert_motifs(motivCur, class = "TFBSTools-PFMatrix"))
    
    # Save it in our list
    motif.list[[i]] = motivConverted
    
    # Increase our list counter
    i = i +1
  }
  
  # We now have a populated list of all motif objects and are almost done.
  
  # We now set proper names for the list, which we simply take from the parsed files
  names(motif.list) = str_remove(basename(pwmfiles),".pwm")
  
  # Finally, we can convert it to a PFMatrixList object
  motifsProcessed <- do.call(PFMatrixList, motif.list)
  
  saveRDS(motifsProcessed, file = file_motifs_rds)
}

Next, we need to do another step before we finished the motif integration. We now have to scan the DNA sequence of each peak for the presence of each motif and create a motif x feature matrix from a set of genomic ranges. his sounds complicated but can done with the help of just one function!

The following chunk takes a few minutes to compute, and we therefore save the results to an rds file in the same way as we stored the Seurat object into an rdsfile - this allows us to compute the motif matrix only once and load it quickly thereafter, as opposed to computing it every time we run this vignette.

 # Define a new file name that we try to load into R directly if it exists. If not, we create it
 motif.matrix_name = paste0(outFolder,"motifmatrix.mouse.rds")
 
 if (file.exists(motif.matrix_name)) {
     
   motif.matrix = readRDS(file  = motif.matrix_name)
   
 } else {
   
    # Let's create it once
    # Exercise. What does use.counts do?
    motif.matrix <- CreateMotifMatrix(
      features = granges(seu.s[["ATAC"]]),
      pwm = motifsProcessed, genome = 'mm10',
      use.counts = FALSE
    )
    colnames(motif.matrix) = TFBSTools::name(motifsProcessed)
    
    # Save it, so next time we can directly read it into R
    saveRDS(motif.matrix, file  = motif.matrix_name)

 }

Add to Seurat object

Finally, we have all the required components together and we can create a Motif object and add it to our Seurat object, similar to what we did before with the annotations.

# Create a new Motif object to store the results
motifsFinal <- CreateMotifObject(data = motif.matrix, pwm = motifsProcessed)

# Let's print the object summary
motifsFinal 
## A Motif object containing 529 motifs in 158583 regions
# Add it to our Seurat object
Motifs(seu.s[["ATAC"]]) <- motifsFinal 

Motif analysis option 1: chromVAR

We can compute a per-cell motif activity score by running an approach called chromVAR, which identifies motifs associated with variability in chromatin accessibility between cells. This can identify differentially-active motifs between cell types. For more details, see the corresponding paper.

Identifying motif activities or TF motif enrichments can help us predict which regulatory factors are most active in our cell type of interest. Using chromVAR, we can predict enrichment of TF activity on a per-cell basis from sparse chromatin accessibility data. Its two primary outputs of are:

For proper calculation of the deviations scores and the underlying statistics, one also needs a set of background peaks, which are chosen using the chromVAR::getBackgroundPeaks() function which samples peaks based on similarity in GC-content and number of fragments across all samples using the Mahalanobis distance. Here, it is called automatically when using the RunChromVAR wrapper.

Depending on the number of cores, the following code will run 10-30 minutes or so, so please be patient.

nCores = 6
register(MulticoreParam(nCores, progressbar = TRUE))

# Compute the GC content, region lengths, and dinucleotide base frequencies for regions in the assay and add to the feature metadata. This is needed for chromVAR to work
# seu.s <- RegionStats(object = seu.s, genome = BSgenome.Mmusculus.UCSC.mm10)

# Now we can run the chromVAR wrapper
seu.s <- RunChromVAR(object = seu.s, genome = BSgenome.Mmusculus.UCSC.mm10)
## Computing GC bias per region
## Selecting background regions
## Computing deviations from background
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |                                                                      |   1%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |=====                                                                 |   8%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |   9%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |=======                                                               |  11%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |===========                                                           |  15%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |============                                                          |  16%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |============                                                          |  18%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |==============                                                        |  19%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |==============                                                        |  21%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |================                                                      |  22%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |=================                                                     |  25%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |===================                                                   |  26%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |===================                                                   |  28%
  |                                                                            
  |====================                                                  |  28%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=====================                                                 |  29%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |=====================                                                 |  31%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |======================                                                |  32%
  |                                                                            
  |=======================                                               |  32%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |=========================                                             |  35%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |==========================                                            |  36%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |============================                                          |  39%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |============================                                          |  41%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |==============================                                        |  42%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |===============================                                       |  45%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |=================================                                     |  48%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |===================================                                   |  49%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |===================================                                   |  51%
  |                                                                            
  |====================================                                  |  51%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |=====================================                                 |  52%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |=======================================                               |  55%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |========================================                              |  58%
  |                                                                            
  |=========================================                             |  58%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |==========================================                            |  59%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |==========================================                            |  61%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |============================================                          |  62%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |============================================                          |  64%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |=============================================                         |  65%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |===============================================                       |  68%
  |                                                                            
  |================================================                      |  68%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |=================================================                     |  69%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |=================================================                     |  71%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |===================================================                   |  72%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |===================================================                   |  74%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |=====================================================                 |  75%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |======================================================                |  78%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |=======================================================               |  79%
  |                                                                            
  |========================================================              |  79%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |========================================================              |  81%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |=========================================================             |  82%
  |                                                                            
  |==========================================================            |  82%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |==========================================================            |  84%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |===========================================================           |  85%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |=============================================================         |  88%
  |                                                                            
  |==============================================================        |  88%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |===============================================================       |  89%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |===============================================================       |  91%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |=================================================================     |  92%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |==================================================================    |  95%
  |                                                                            
  |===================================================================   |  95%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |====================================================================  |  98%
  |                                                                            
  |===================================================================== |  98%
  |                                                                            
  |===================================================================== |  99%
  |                                                                            
  |======================================================================|  99%
  |                                                                            
  |======================================================================| 100%
## Constructing chromVAR assay
## Warning: Feature names cannot have underscores ('_'), replacing with dashes
## ('-')

This took a while but we finally have the chromVAR scores that serve as a proxy for TF activity. Let’s check them out now!

Find all differentially active TFs

First, we find all markers across all clusters based on the chromVAR scores given the identity of cells as defined previously using the dimensionality reduction methods we employed.

DefaultAssay(seu.s) <- 'chromvar'
Idents(seu.s) <- seu.s$ATAC_snn_res.0.5
diffTF_all <- FindAllMarkers(seu.s, only.pos = TRUE, min.pct = 0.25, 
                            logfc.threshold = 0.5, mean.fxn = rowMeans, fc.name = "avg_diff", verbose = FALSE)

Visualise TFs per cluster

Time to visualize our results!

gr1 = diffTF_all %>%
    group_by(cluster) %>%
    slice_max(n = 3, order_by = avg_diff)
gr1
for (cl in sort(unique(gr1$cluster))) {
  
    fts = gr1 %>% filter(cluster == cl) %>% pull(gene) 
  
    p1 = VlnPlot(seu.s, features = fts, slot = "data", log = TRUE, pt.size = 0.1, ) + 
        ylab("TF activity level") & theme(plot.title = element_text(size = 7))
    
    p2 = FeaturePlot(seu.s, features = fts, 
                     min.cutoff = 'q5',max.cutoff = 'q95',
                     pt.size = 0.1, ncol = 3, reduction = "umap")  & theme(plot.title = element_text(size = 7))
    
    pall = p1 / p2 + plot_annotation(title = paste0("Cluster ",cl))
    
    # Let's not print the warnings about non-finite and missing values
    suppressWarnings(print(pall))
}

Summary heatmap

Let’s again create a heatmap, similar to the heatmap for the gene activities, this time with the chromVAR values:

diffTF_all_top10 <- diffTF_all %>%
    group_by(cluster) %>%
    top_n(n = 10, wt = avg_diff) 

DoHeatmap(seu.s, features = diffTF_all_top10$gene, slot = "data") 

Differentially accessible TFs between clusters

Analysis of single-cell chromatin accessibility data generally allows us to detect often novel cell clusters, while how to interpret these clusters remains a big challenge. Let’s try to understand the clusters better. For this, let’s calculate now the differential TF activity between a pair of clusters based on chromVAR scores! Note that this is a different question from what we have done before: So far, we only identified cluster-specific TFs, while now, we would like to which TFs make two clusters different!

Thus, instead of finding all TFs that are deemed differential among all clusters, we can also specifically find marker TFs or marker peaks that are differential only when comparing a particular set of clusters. Let’s try this out! Instead of the function FindAllMarkers, we will use the function FindMarkers.

Let’s first calculate differential TF activity between cluster 1 as compare to cluster 0.

# mean.fxn =    Function to use for fold change or average difference calculation
differential.activity <- FindMarkers(
  object = seu.s,
  ident.1 = '1', ident.2 = '0',
  only.pos = TRUE, mean.fxn = rowMeans,
  fc.name = "avg_diff", 
  assay = 'chromvar',
  verbose = FALSE)

Now that we identified a few TF that are differentially active between the two specified clusters, let’s try to see what their TF binding motifs (TFBMs) look like. The consensus sequence of a TFBM is variable, and there are a number of possible bases at certain positions in the motif, whereas other positions have a fixed base. These are usually illustrated in sequence logo diagrams (see below), where the height of the letter represents how frequently that nucleotide is observed in that position.

We here plot the first 5 TFBMs to peek into the data:

TFs = rownames(differential.activity) %>% head(5) %>% str_replace("-","_")
MotifPlot(object = seu.s, motifs = TFs, assay = 'ATAC')
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.

Motif analysis option 2: Finding overrepresented motifs in peaks

Alternatively, to identify potentially important cell-type-specific regulatory sequences, we can search for DNA motifs that are overrepresented in a set of peaks that are differentially accessible between cell types / clusters.

Marker peak identification

To find them, we can perform a differential accessibility (DA) test. Here, we utilize a method called logistic regression for DA, as suggested by Ntranos et al. 2018 for scRNA-seq data, and add the total number of fragments as a latent variable to mitigate the effect of differential sequencing depth on the result. In addition, for sparse data such as scATAC-seq, it is often necessary to lower the min.pct threshold in FindMarkers() from the default (0.1, which was designed for scRNA-seq data).

Note that we now have to use our original ATAC assay, and not the chromVAR assay as before!

# Now, we work again with the ATAC assay and not the chromVAR assay
DefaultAssay(seu.s) = "ATAC"

# min.pct: only test genes that are detected in a minimum fraction of min.pct cells in either of the two population
da_peaks <- FindMarkers(
  object = seu.s,ident.1 = '0',ident.2 = '1',
  only.pos = TRUE, test.use = 'LR', min.pct = 0.05,
  latent.vars = 'peak_region_fragments', 
  verbose = FALSE)

# get top differentially accessible peaks
top.da.peak <- rownames(da_peaks[da_peaks$p_val < 0.005, ])

Constructing a matched background

We now have a set of differentially active peaks, but how can we judge whether they are statistically meaningful? For proper enrichment analyses and statistics, matching the set of background peaks is essential when finding enriched DNA sequence motifs. A good choice is usually to choose a set of peaks matched for GC content, but it can be sometimes be beneficial to further restrict the background peaks to those that are accessible in the groups of cells compared when finding differentially accessible peaks.

The AccessiblePeaks() function can be used to find a set of peaks that are open in a subset of cells. We can use this function to first restrict the set of possible background peaks to those peaks that were open in the set of cells compared in FindMarkers(), and then create a GC-content-matched set of peaks from this larger set using MatchRegionStats().

# find open peaks
open.peaks <- AccessiblePeaks(seu.s, idents = c("1", "0"))

# match the overall GC content in the peak set

# Calculate meta features for object
seu.s <- RegionStats(object = seu.s, genome = BSgenome.Mmusculus.UCSC.mm10, assay = "ATAC")
meta.feature <- GetAssayData(seu.s, assay = "ATAC", slot = "meta.features")

peaks.matched <- MatchRegionStats(
  meta.feature = meta.feature[open.peaks, ],
  query.feature = meta.feature[top.da.peak, ],
  n = 50000
)
## Matching GC.percent distribution

Enrichment test

We are now ready to perform a statistical test to quantify the probability of observing the motif at the given frequency by chance, comparing with a background set of peaks matched for GC content, by specifying a foreground and the previously calculated matched background:

# test enrichment
enriched.motifs <- FindMotifs(
  object = seu.s,
  features = top.da.peak,
  background = peaks.matched
)
## Testing motif enrichment in 400 regions
# Let's check the results
enriched.motifs

Visualization

We can also plot the position weight matrices for the motifs, so we can visualize the different motif sequences. it is finally the time to visualize our results!

TFs = str_replace(head(rownames(enriched.motifs)),"-","_")
MotifPlot(object = seu.s,motifs = TFs,assay = 'ATAC')
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.

We can also visualize these marker peaks on a violin plot, feature plot, dot plot, heat map, or any visualization tool in Seurat.

Finally, as an exercise, you can now repeat the same procedure; but this time, we identify peaks the other way around: Peaks that are more accessible in cluster 1 as compared to cluster 0

TF footprints

Transcription factor (TF) footprinting allows for the prediction of the precise binding location of a TF at a particular locus. For both bulk and scATAC-Seq, this is because the DNA bases that are directly bound by the TF are actually protected from transposition while the DNA bases immediately adjacent to TF binding are accessible, thus leaving a footprint. This allows us to generate average ATAC-seq profiles around binding sites of particular TF (a TF footprint). The last task in this vignette is to generate some TF footprints!

Ideally, TF footprinting is performed at a single site to determine the precise binding location of the TF. However, in practice, this requires very high sequencing depth, often much higher depth than what most users would obtain from either bulk or single-cell ATAC-seq. To get around this problem, we can combine Tn5 insertion locations across many instances of predicted TF binding. For example, we can take all peaks that harbor a CTCF motif and make an aggregate TF footprint for CTCF across the whole genome. The accuracy of this footprint relies on generating a reliable curated list of predicted binding sites for the TF of interest.

Adding motif information to the object

To prepare this, we need to add the HOCOMOCO motifs to our object in a way that is compatible with the footprinting functions. Why we need to to this? Generally, to facilitate motif analysis in Signac, we have to create the Motif class to store all the required information, including a list of position weight matrices (PWMs) or position frequency matrices (PFMs) and a motif occurrence matrix. Here, the AddMotifs() function construct a Motif object and adds it to our dataset, along with other information such as the base composition of each peak. Even more generally, a motif object can be added to any Seurat-compatible assay using the SetAssayData() function. The following chunk takes a few minutes to run, so please be patient.

DefaultAssay(seu.s) <- 'ATAC'
seu.s <- AddMotifs(seu.s, genome = BSgenome.Mmusculus.UCSC.mm10, pfm = motifsProcessed)
## Building motif matrix
## Finding motif positions
## Creating Motif object

Alternatively, if we used a different motif database such as JASPAR, we could add it to our object in a similar fashion:

library(JASPAR2020)
# Potentially specifying the species is also needed here
motifDB = getMatrixSet(JASPAR2020, opts = list(collection = "CORE", tax_group = 'vertebrates', all_versions = FALSE))
)

Footprint computation

Now we can footprint any motif that we have positional information for. Here, we can run it just for a few selected TFs. The footprinting function computes the normalized observed vs. expected Tn5 insertion frequency for each position surrounding a set of motif instances. By default, this includes every instance of the motif in the genome. We can instead use the in.peaks = TRUE parameter to include only those motifs that fall inside a peak in the assay. The Footprint() function gathers all the required data and stores it in the assay.

Again, this will take some minutes to run.

# Pre-selected TFs that should be more active for cluster 0 and 1. ERR3 + SOX2 should be more in ES cells (cluster 1) and TEAD4 + PITX1 in neuronal ones (cluster 0).
TFs = c("ERR3_MOUSE.H11MO.0.C", "SOX2_MOUSE.H11MO.0.A", "TEAD4_MOUSE.H11MO.0.A","PITX1_MOUSE.H11MO.0.C")

# gather the footprinting information for sets of motifs
seu.s <- Footprint(
  object = seu.s,
  motif.name = TFs,
  in.peaks = FALSE,
  genome = BSgenome.Mmusculus.UCSC.mm10
)
## Computing Tn5 insertion bias
## Extracting reads in requested region
## Computing observed Tn5 insertions per base
## Warning in .Seqinfo.mergexy(x, y): Each of the 2 combined objects has sequence levels not in the other:
##   - in 'x': chrM, chr1_GL455991_alt, chr1_GL455992_alt, chr1_GL455993_alt, chr1_GL456005_alt, chr1_JH584315_alt, chr1_JH584320_alt, chr1_JH584321_alt, chr1_JH584322_alt, chr2_GL456024_alt, chr3_GL456006_alt, chr3_GL456007_alt, chr3_GL456008_alt, chr3_GL456042_alt, chr3_GL456044_alt, chr3_GL456045_alt, chr3_GL456048_alt, chr3_GL456049_alt, chr3_JH584323_alt, chr4_GL455994_alt, chr4_GL456009_alt, chr4_GL456010_alt, chr4_GL456053_alt, chr4_GL456064_alt, chr4_GL456075_alt, chr4_GL456076_alt, chr4_GL456077_alt, chr4_JH584268_alt, chr4_JH584269_alt, chr4_JH584324_alt, chr4_JH584325_alt, chr4_JH584326_alt, chr5_GL455995_alt, chr5_GL456011_alt, chr6_GL456012_alt, chr6_GL456025_alt, chr6_GL456026_alt, chr6_GL456054_alt, chr6_GL456065_alt, chr6_JH584264_alt, chr7_GL455989_alt, chr7_GL456013_alt, chr7_GL456014_alt, chr8_GL455996_alt, chr8_GL455997_alt, chr10_GL456015_alt, chr11_GL455998_alt, chr11_GL456016_alt, chr11_GL456060_alt, chr11_JH584265_alt, chr11_JH584316_alt, chr11_JH584317_alt, chr11_JH584327_alt, chr12_GL456017_alt, chr12_GL456068_alt, chr12_GL456074_alt, chr12_GL456078_alt, chr12_GL456349_alt, chr13_GL455990_alt, chr13_GL455999_alt, chr13_JH584305_alt, chr14_GL456019_alt, chr14_GL456020_alt, chr15_GL456000_alt, chr15_JH584270_alt, chr16_GL456001_alt, chr16_GL456028_alt, chr16_JH584306_alt, chr16_JH584307_alt, chr16_JH584310_alt, chr16_JH584311_alt, chr16_JH584312_alt, chr16_JH584313_alt, chr16_JH584314_alt, chr17_GL456002_alt, chr17_GL456021_alt, chr17_GL456022_alt, chr17_GL456069_alt, chr17_JH584266_alt, chr17_JH584267_alt, chr17_JH584308_alt, chr17_JH584309_alt, chr17_JH584328_alt, chr17_JH590470_alt, chr18_GL456070_alt, chr18_JH584318_alt, chr19_GL456079_alt, chr19_JH584319_alt, chrX_GL456003_alt, chrX_GL456004_alt, chrX_GL456031_alt, chrX_GL456032_alt, chrX_GL456033_alt, chrY_GL456071_alt, chrY_GL456072_alt, chrY_GL456073_alt, chrY_GL456080_alt, chrY_GL456081_alt, chrY_GL456082_alt, chrna_GL456050_alt, chr1_GL456210_random, chr1_GL456211_random, chr1_GL456212_random, chr1_GL456213_random, chr1_GL456221_random, chr4_GL456216_random, chr4_GL456350_random, chr4_JH584292_random, chr4_JH584293_random, chr4_JH584294_random, chr4_JH584295_random, chr5_GL456354_random, chr5_JH584296_random, chr5_JH584297_random, chr5_JH584298_random, chr5_JH584299_random, chr7_GL456219_random, chrX_GL456233_random, chrY_JH584300_random, chrY_JH584301_random, chrY_JH584302_random, chrY_JH584303_random, chrUn_GL456239, chrUn_GL456359, chrUn_GL456360, chrUn_GL456366, chrUn_GL456367, chrUn_GL456368, chrUn_GL456370, chrUn_GL456372, chrUn_GL456378, chrUn_GL456379, chrUn_GL456381, chrUn_GL456382, chrUn_GL456383, chrUn_GL456385, chrUn_GL456387, chrUn_GL456389, chrUn_GL456390, chrUn_GL456392, chrUn_GL456393, chrUn_GL456394, chrUn_GL456396, chrUn_JH584304, chr1_KV575232_fix, chr1_KV575233_fix, chr1_KV575234_fix, chr2_KV575235_fix, chr2_KV575236_fix, chr3_KQ030484_fix, chr3_KZ289064_fix, chr3_KZ289065_fix, chr3_KZ289066_fix, chr4_JH792826_fix, chr4_KQ030485_fix, chr4_KQ030486_fix, chr4_KQ030487_fix, chr4_KQ030488_fix, chr4_KQ030489_fix, chr4_KZ289067_fix, chr4_KZ289068_fix, chr4_KZ289069_fix, chr4_KZ289070_fix, chr5_JH792827_fix, chr5_KV575237_fix, chr6_KK082442_fix, chr6_KK082443_fix, chr7_JH792828_fix, chr7_KV575238_fix, chr7_KV575239_fix, chr8_KV575240_fix, chr9_KB469738_fix, chr9_KQ030490_fix, chr10_KQ030491_fix, chr10_KZ289071_fix, chr10_KZ289072_fix, chr11_KB469739_fix, chr11_KZ289076_fix, chr12_KB469740_fix, chr12_KZ289082_fix, chr14_KZ289083_fix, chr14_KZ289084_fix, chr15_KQ030492_fix, chr15_KQ030493_fix, chr15_KV575241_fix, chr15_KZ289085_fix, chr15_KZ289086_fix, chr16_KB469741_fix, chr17_KB469742_fix, chr17_KZ289087_fix, chr17_KZ289088_fix, chr17_KZ289089_fix, chr18_JH792829_fix, chr18_KZ289090_fix, chr18_KZ289091_fix, chr19_JH792830_fix, chr19_KQ030494_fix, chr19_KV575242_fix, chrX_JH792831_fix, chrX_KQ030495_fix, chrX_KQ030496_fix, chrX_KQ030497_fix, chrX_KZ289092_fix, chrX_KZ289093_fix, chrX_KZ289094_fix, chrX_KZ289095_fix, chrY_JH792832_fix, chrY_JH792833_fix, chrY_JH792834_fix, chr1_KK082441_alt, chr11_KZ289073_alt, chr11_KZ289074_alt, chr11_KZ289075_alt, chr11_KZ289077_alt, chr11_KZ289078_alt, chr11_KZ289079_alt, chr11_KZ289080_alt, chr11_KZ289081_alt
##   - in 'y': GL456216.1, GL456221.1, GL456233.1, JH584295.1, JH584304.1
##   Make sure to always combine/compare objects based on the same reference
##   genome (use suppressWarnings() to suppress this warning).
## Computing base composition at motif sites
## Computing expected Tn5 insertions per base
## Finding + strand cut sites
## Finding - strand cut sites
## Computing observed Tn5 insertions per base
## Warning in .Seqinfo.mergexy(x, y): Each of the 2 combined objects has sequence levels not in the other:
##   - in 'x': chrM, chr1_GL455991_alt, chr1_GL455992_alt, chr1_GL455993_alt, chr1_GL456005_alt, chr1_JH584315_alt, chr1_JH584320_alt, chr1_JH584321_alt, chr1_JH584322_alt, chr2_GL456024_alt, chr3_GL456006_alt, chr3_GL456007_alt, chr3_GL456008_alt, chr3_GL456042_alt, chr3_GL456044_alt, chr3_GL456045_alt, chr3_GL456048_alt, chr3_GL456049_alt, chr3_JH584323_alt, chr4_GL455994_alt, chr4_GL456009_alt, chr4_GL456010_alt, chr4_GL456053_alt, chr4_GL456064_alt, chr4_GL456075_alt, chr4_GL456076_alt, chr4_GL456077_alt, chr4_JH584268_alt, chr4_JH584269_alt, chr4_JH584324_alt, chr4_JH584325_alt, chr4_JH584326_alt, chr5_GL455995_alt, chr5_GL456011_alt, chr6_GL456012_alt, chr6_GL456025_alt, chr6_GL456026_alt, chr6_GL456054_alt, chr6_GL456065_alt, chr6_JH584264_alt, chr7_GL455989_alt, chr7_GL456013_alt, chr7_GL456014_alt, chr8_GL455996_alt, chr8_GL455997_alt, chr10_GL456015_alt, chr11_GL455998_alt, chr11_GL456016_alt, chr11_GL456060_alt, chr11_JH584265_alt, chr11_JH584316_alt, chr11_JH584317_alt, chr11_JH584327_alt, chr12_GL456017_alt, chr12_GL456068_alt, chr12_GL456074_alt, chr12_GL456078_alt, chr12_GL456349_alt, chr13_GL455990_alt, chr13_GL455999_alt, chr13_JH584305_alt, chr14_GL456019_alt, chr14_GL456020_alt, chr15_GL456000_alt, chr15_JH584270_alt, chr16_GL456001_alt, chr16_GL456028_alt, chr16_JH584306_alt, chr16_JH584307_alt, chr16_JH584310_alt, chr16_JH584311_alt, chr16_JH584312_alt, chr16_JH584313_alt, chr16_JH584314_alt, chr17_GL456002_alt, chr17_GL456021_alt, chr17_GL456022_alt, chr17_GL456069_alt, chr17_JH584266_alt, chr17_JH584267_alt, chr17_JH584308_alt, chr17_JH584309_alt, chr17_JH584328_alt, chr17_JH590470_alt, chr18_GL456070_alt, chr18_JH584318_alt, chr19_GL456079_alt, chr19_JH584319_alt, chrX_GL456003_alt, chrX_GL456004_alt, chrX_GL456031_alt, chrX_GL456032_alt, chrX_GL456033_alt, chrY_GL456071_alt, chrY_GL456072_alt, chrY_GL456073_alt, chrY_GL456080_alt, chrY_GL456081_alt, chrY_GL456082_alt, chrna_GL456050_alt, chr1_GL456210_random, chr1_GL456211_random, chr1_GL456212_random, chr1_GL456213_random, chr1_GL456221_random, chr4_GL456216_random, chr4_GL456350_random, chr4_JH584292_random, chr4_JH584293_random, chr4_JH584294_random, chr4_JH584295_random, chr5_GL456354_random, chr5_JH584296_random, chr5_JH584297_random, chr5_JH584298_random, chr5_JH584299_random, chr7_GL456219_random, chrX_GL456233_random, chrY_JH584300_random, chrY_JH584301_random, chrY_JH584302_random, chrY_JH584303_random, chrUn_GL456239, chrUn_GL456359, chrUn_GL456360, chrUn_GL456366, chrUn_GL456367, chrUn_GL456368, chrUn_GL456370, chrUn_GL456372, chrUn_GL456378, chrUn_GL456379, chrUn_GL456381, chrUn_GL456382, chrUn_GL456383, chrUn_GL456385, chrUn_GL456387, chrUn_GL456389, chrUn_GL456390, chrUn_GL456392, chrUn_GL456393, chrUn_GL456394, chrUn_GL456396, chrUn_JH584304, chr1_KV575232_fix, chr1_KV575233_fix, chr1_KV575234_fix, chr2_KV575235_fix, chr2_KV575236_fix, chr3_KQ030484_fix, chr3_KZ289064_fix, chr3_KZ289065_fix, chr3_KZ289066_fix, chr4_JH792826_fix, chr4_KQ030485_fix, chr4_KQ030486_fix, chr4_KQ030487_fix, chr4_KQ030488_fix, chr4_KQ030489_fix, chr4_KZ289067_fix, chr4_KZ289068_fix, chr4_KZ289069_fix, chr4_KZ289070_fix, chr5_JH792827_fix, chr5_KV575237_fix, chr6_KK082442_fix, chr6_KK082443_fix, chr7_JH792828_fix, chr7_KV575238_fix, chr7_KV575239_fix, chr8_KV575240_fix, chr9_KB469738_fix, chr9_KQ030490_fix, chr10_KQ030491_fix, chr10_KZ289071_fix, chr10_KZ289072_fix, chr11_KB469739_fix, chr11_KZ289076_fix, chr12_KB469740_fix, chr12_KZ289082_fix, chr14_KZ289083_fix, chr14_KZ289084_fix, chr15_KQ030492_fix, chr15_KQ030493_fix, chr15_KV575241_fix, chr15_KZ289085_fix, chr15_KZ289086_fix, chr16_KB469741_fix, chr17_KB469742_fix, chr17_KZ289087_fix, chr17_KZ289088_fix, chr17_KZ289089_fix, chr18_JH792829_fix, chr18_KZ289090_fix, chr18_KZ289091_fix, chr19_JH792830_fix, chr19_KQ030494_fix, chr19_KV575242_fix, chrX_JH792831_fix, chrX_KQ030495_fix, chrX_KQ030496_fix, chrX_KQ030497_fix, chrX_KZ289092_fix, chrX_KZ289093_fix, chrX_KZ289094_fix, chrX_KZ289095_fix, chrY_JH792832_fix, chrY_JH792833_fix, chrY_JH792834_fix, chr1_KK082441_alt, chr11_KZ289073_alt, chr11_KZ289074_alt, chr11_KZ289075_alt, chr11_KZ289077_alt, chr11_KZ289078_alt, chr11_KZ289079_alt, chr11_KZ289080_alt, chr11_KZ289081_alt
##   - in 'y': GL456216.1, GL456221.1, GL456233.1, JH584295.1, JH584304.1
##   Make sure to always combine/compare objects based on the same reference
##   genome (use suppressWarnings() to suppress this warning).
## Computing base composition at motif sites
## Computing expected Tn5 insertions per base
## Finding + strand cut sites
## Finding - strand cut sites
## Computing observed Tn5 insertions per base
## Warning in .Seqinfo.mergexy(x, y): Each of the 2 combined objects has sequence levels not in the other:
##   - in 'x': chrM, chr1_GL455991_alt, chr1_GL455992_alt, chr1_GL455993_alt, chr1_GL456005_alt, chr1_JH584315_alt, chr1_JH584320_alt, chr1_JH584321_alt, chr1_JH584322_alt, chr2_GL456024_alt, chr3_GL456006_alt, chr3_GL456007_alt, chr3_GL456008_alt, chr3_GL456042_alt, chr3_GL456044_alt, chr3_GL456045_alt, chr3_GL456048_alt, chr3_GL456049_alt, chr3_JH584323_alt, chr4_GL455994_alt, chr4_GL456009_alt, chr4_GL456010_alt, chr4_GL456053_alt, chr4_GL456064_alt, chr4_GL456075_alt, chr4_GL456076_alt, chr4_GL456077_alt, chr4_JH584268_alt, chr4_JH584269_alt, chr4_JH584324_alt, chr4_JH584325_alt, chr4_JH584326_alt, chr5_GL455995_alt, chr5_GL456011_alt, chr6_GL456012_alt, chr6_GL456025_alt, chr6_GL456026_alt, chr6_GL456054_alt, chr6_GL456065_alt, chr6_JH584264_alt, chr7_GL455989_alt, chr7_GL456013_alt, chr7_GL456014_alt, chr8_GL455996_alt, chr8_GL455997_alt, chr10_GL456015_alt, chr11_GL455998_alt, chr11_GL456016_alt, chr11_GL456060_alt, chr11_JH584265_alt, chr11_JH584316_alt, chr11_JH584317_alt, chr11_JH584327_alt, chr12_GL456017_alt, chr12_GL456068_alt, chr12_GL456074_alt, chr12_GL456078_alt, chr12_GL456349_alt, chr13_GL455990_alt, chr13_GL455999_alt, chr13_JH584305_alt, chr14_GL456019_alt, chr14_GL456020_alt, chr15_GL456000_alt, chr15_JH584270_alt, chr16_GL456001_alt, chr16_GL456028_alt, chr16_JH584306_alt, chr16_JH584307_alt, chr16_JH584310_alt, chr16_JH584311_alt, chr16_JH584312_alt, chr16_JH584313_alt, chr16_JH584314_alt, chr17_GL456002_alt, chr17_GL456021_alt, chr17_GL456022_alt, chr17_GL456069_alt, chr17_JH584266_alt, chr17_JH584267_alt, chr17_JH584308_alt, chr17_JH584309_alt, chr17_JH584328_alt, chr17_JH590470_alt, chr18_GL456070_alt, chr18_JH584318_alt, chr19_GL456079_alt, chr19_JH584319_alt, chrX_GL456003_alt, chrX_GL456004_alt, chrX_GL456031_alt, chrX_GL456032_alt, chrX_GL456033_alt, chrY_GL456071_alt, chrY_GL456072_alt, chrY_GL456073_alt, chrY_GL456080_alt, chrY_GL456081_alt, chrY_GL456082_alt, chrna_GL456050_alt, chr1_GL456210_random, chr1_GL456211_random, chr1_GL456212_random, chr1_GL456213_random, chr1_GL456221_random, chr4_GL456216_random, chr4_GL456350_random, chr4_JH584292_random, chr4_JH584293_random, chr4_JH584294_random, chr4_JH584295_random, chr5_GL456354_random, chr5_JH584296_random, chr5_JH584297_random, chr5_JH584298_random, chr5_JH584299_random, chr7_GL456219_random, chrX_GL456233_random, chrY_JH584300_random, chrY_JH584301_random, chrY_JH584302_random, chrY_JH584303_random, chrUn_GL456239, chrUn_GL456359, chrUn_GL456360, chrUn_GL456366, chrUn_GL456367, chrUn_GL456368, chrUn_GL456370, chrUn_GL456372, chrUn_GL456378, chrUn_GL456379, chrUn_GL456381, chrUn_GL456382, chrUn_GL456383, chrUn_GL456385, chrUn_GL456387, chrUn_GL456389, chrUn_GL456390, chrUn_GL456392, chrUn_GL456393, chrUn_GL456394, chrUn_GL456396, chrUn_JH584304, chr1_KV575232_fix, chr1_KV575233_fix, chr1_KV575234_fix, chr2_KV575235_fix, chr2_KV575236_fix, chr3_KQ030484_fix, chr3_KZ289064_fix, chr3_KZ289065_fix, chr3_KZ289066_fix, chr4_JH792826_fix, chr4_KQ030485_fix, chr4_KQ030486_fix, chr4_KQ030487_fix, chr4_KQ030488_fix, chr4_KQ030489_fix, chr4_KZ289067_fix, chr4_KZ289068_fix, chr4_KZ289069_fix, chr4_KZ289070_fix, chr5_JH792827_fix, chr5_KV575237_fix, chr6_KK082442_fix, chr6_KK082443_fix, chr7_JH792828_fix, chr7_KV575238_fix, chr7_KV575239_fix, chr8_KV575240_fix, chr9_KB469738_fix, chr9_KQ030490_fix, chr10_KQ030491_fix, chr10_KZ289071_fix, chr10_KZ289072_fix, chr11_KB469739_fix, chr11_KZ289076_fix, chr12_KB469740_fix, chr12_KZ289082_fix, chr14_KZ289083_fix, chr14_KZ289084_fix, chr15_KQ030492_fix, chr15_KQ030493_fix, chr15_KV575241_fix, chr15_KZ289085_fix, chr15_KZ289086_fix, chr16_KB469741_fix, chr17_KB469742_fix, chr17_KZ289087_fix, chr17_KZ289088_fix, chr17_KZ289089_fix, chr18_JH792829_fix, chr18_KZ289090_fix, chr18_KZ289091_fix, chr19_JH792830_fix, chr19_KQ030494_fix, chr19_KV575242_fix, chrX_JH792831_fix, chrX_KQ030495_fix, chrX_KQ030496_fix, chrX_KQ030497_fix, chrX_KZ289092_fix, chrX_KZ289093_fix, chrX_KZ289094_fix, chrX_KZ289095_fix, chrY_JH792832_fix, chrY_JH792833_fix, chrY_JH792834_fix, chr1_KK082441_alt, chr11_KZ289073_alt, chr11_KZ289074_alt, chr11_KZ289075_alt, chr11_KZ289077_alt, chr11_KZ289078_alt, chr11_KZ289079_alt, chr11_KZ289080_alt, chr11_KZ289081_alt
##   - in 'y': GL456216.1, GL456221.1, GL456233.1, JH584295.1, JH584304.1
##   Make sure to always combine/compare objects based on the same reference
##   genome (use suppressWarnings() to suppress this warning).
## Computing base composition at motif sites
## Computing expected Tn5 insertions per base
## Finding + strand cut sites
## Finding - strand cut sites
## Computing observed Tn5 insertions per base
## Warning in .Seqinfo.mergexy(x, y): Each of the 2 combined objects has sequence levels not in the other:
##   - in 'x': chrM, chr1_GL455991_alt, chr1_GL455992_alt, chr1_GL455993_alt, chr1_GL456005_alt, chr1_JH584315_alt, chr1_JH584320_alt, chr1_JH584321_alt, chr1_JH584322_alt, chr2_GL456024_alt, chr3_GL456006_alt, chr3_GL456007_alt, chr3_GL456008_alt, chr3_GL456042_alt, chr3_GL456044_alt, chr3_GL456045_alt, chr3_GL456048_alt, chr3_GL456049_alt, chr3_JH584323_alt, chr4_GL455994_alt, chr4_GL456009_alt, chr4_GL456010_alt, chr4_GL456053_alt, chr4_GL456064_alt, chr4_GL456075_alt, chr4_GL456076_alt, chr4_GL456077_alt, chr4_JH584268_alt, chr4_JH584269_alt, chr4_JH584324_alt, chr4_JH584325_alt, chr4_JH584326_alt, chr5_GL455995_alt, chr5_GL456011_alt, chr6_GL456012_alt, chr6_GL456025_alt, chr6_GL456026_alt, chr6_GL456054_alt, chr6_GL456065_alt, chr6_JH584264_alt, chr7_GL455989_alt, chr7_GL456013_alt, chr7_GL456014_alt, chr8_GL455996_alt, chr8_GL455997_alt, chr10_GL456015_alt, chr11_GL455998_alt, chr11_GL456016_alt, chr11_GL456060_alt, chr11_JH584265_alt, chr11_JH584316_alt, chr11_JH584317_alt, chr11_JH584327_alt, chr12_GL456017_alt, chr12_GL456068_alt, chr12_GL456074_alt, chr12_GL456078_alt, chr12_GL456349_alt, chr13_GL455990_alt, chr13_GL455999_alt, chr13_JH584305_alt, chr14_GL456019_alt, chr14_GL456020_alt, chr15_GL456000_alt, chr15_JH584270_alt, chr16_GL456001_alt, chr16_GL456028_alt, chr16_JH584306_alt, chr16_JH584307_alt, chr16_JH584310_alt, chr16_JH584311_alt, chr16_JH584312_alt, chr16_JH584313_alt, chr16_JH584314_alt, chr17_GL456002_alt, chr17_GL456021_alt, chr17_GL456022_alt, chr17_GL456069_alt, chr17_JH584266_alt, chr17_JH584267_alt, chr17_JH584308_alt, chr17_JH584309_alt, chr17_JH584328_alt, chr17_JH590470_alt, chr18_GL456070_alt, chr18_JH584318_alt, chr19_GL456079_alt, chr19_JH584319_alt, chrX_GL456003_alt, chrX_GL456004_alt, chrX_GL456031_alt, chrX_GL456032_alt, chrX_GL456033_alt, chrY_GL456071_alt, chrY_GL456072_alt, chrY_GL456073_alt, chrY_GL456080_alt, chrY_GL456081_alt, chrY_GL456082_alt, chrna_GL456050_alt, chr1_GL456210_random, chr1_GL456211_random, chr1_GL456212_random, chr1_GL456213_random, chr1_GL456221_random, chr4_GL456216_random, chr4_GL456350_random, chr4_JH584292_random, chr4_JH584293_random, chr4_JH584294_random, chr4_JH584295_random, chr5_GL456354_random, chr5_JH584296_random, chr5_JH584297_random, chr5_JH584298_random, chr5_JH584299_random, chr7_GL456219_random, chrX_GL456233_random, chrY_JH584300_random, chrY_JH584301_random, chrY_JH584302_random, chrY_JH584303_random, chrUn_GL456239, chrUn_GL456359, chrUn_GL456360, chrUn_GL456366, chrUn_GL456367, chrUn_GL456368, chrUn_GL456370, chrUn_GL456372, chrUn_GL456378, chrUn_GL456379, chrUn_GL456381, chrUn_GL456382, chrUn_GL456383, chrUn_GL456385, chrUn_GL456387, chrUn_GL456389, chrUn_GL456390, chrUn_GL456392, chrUn_GL456393, chrUn_GL456394, chrUn_GL456396, chrUn_JH584304, chr1_KV575232_fix, chr1_KV575233_fix, chr1_KV575234_fix, chr2_KV575235_fix, chr2_KV575236_fix, chr3_KQ030484_fix, chr3_KZ289064_fix, chr3_KZ289065_fix, chr3_KZ289066_fix, chr4_JH792826_fix, chr4_KQ030485_fix, chr4_KQ030486_fix, chr4_KQ030487_fix, chr4_KQ030488_fix, chr4_KQ030489_fix, chr4_KZ289067_fix, chr4_KZ289068_fix, chr4_KZ289069_fix, chr4_KZ289070_fix, chr5_JH792827_fix, chr5_KV575237_fix, chr6_KK082442_fix, chr6_KK082443_fix, chr7_JH792828_fix, chr7_KV575238_fix, chr7_KV575239_fix, chr8_KV575240_fix, chr9_KB469738_fix, chr9_KQ030490_fix, chr10_KQ030491_fix, chr10_KZ289071_fix, chr10_KZ289072_fix, chr11_KB469739_fix, chr11_KZ289076_fix, chr12_KB469740_fix, chr12_KZ289082_fix, chr14_KZ289083_fix, chr14_KZ289084_fix, chr15_KQ030492_fix, chr15_KQ030493_fix, chr15_KV575241_fix, chr15_KZ289085_fix, chr15_KZ289086_fix, chr16_KB469741_fix, chr17_KB469742_fix, chr17_KZ289087_fix, chr17_KZ289088_fix, chr17_KZ289089_fix, chr18_JH792829_fix, chr18_KZ289090_fix, chr18_KZ289091_fix, chr19_JH792830_fix, chr19_KQ030494_fix, chr19_KV575242_fix, chrX_JH792831_fix, chrX_KQ030495_fix, chrX_KQ030496_fix, chrX_KQ030497_fix, chrX_KZ289092_fix, chrX_KZ289093_fix, chrX_KZ289094_fix, chrX_KZ289095_fix, chrY_JH792832_fix, chrY_JH792833_fix, chrY_JH792834_fix, chr1_KK082441_alt, chr11_KZ289073_alt, chr11_KZ289074_alt, chr11_KZ289075_alt, chr11_KZ289077_alt, chr11_KZ289078_alt, chr11_KZ289079_alt, chr11_KZ289080_alt, chr11_KZ289081_alt
##   - in 'y': GL456216.1, GL456221.1, GL456233.1, JH584295.1, JH584304.1
##   Make sure to always combine/compare objects based on the same reference
##   genome (use suppressWarnings() to suppress this warning).
## Computing base composition at motif sites
## Computing expected Tn5 insertions per base
## Finding + strand cut sites
## Finding - strand cut sites

Footprint visualization

We can now plot the footprinted motifs using the PlotFootprint() function.

TFs = c("ERR3_MOUSE.H11MO.0.C", "SOX2_MOUSE.H11MO.0.A", "TEAD4_MOUSE.H11MO.0.A","PITX1_MOUSE.H11MO.0.C")

# Plot the footprint data for each group of cells and each cluster
p2 <- PlotFootprint(seu.s, features = TFs[1:2])
p2 + patchwork::plot_layout(ncol = 2) & theme(plot.title = element_text(size = 10))
## Warning: Removed 2550 rows containing missing values (geom_label_repel).
## Warning: Removed 2580 rows containing missing values (geom_label_repel).

p3 <- PlotFootprint(seu.s, features = TFs[3:4])
p3 + patchwork::plot_layout(ncol = 2) & theme(plot.title = element_text(size = 10))
## Warning: Removed 2575 rows containing missing values (geom_label_repel).
## Warning: Removed 2550 rows containing missing values (geom_label_repel).

This concludes the TF footprint analysis as part of this vignette!

Save object to disk

We reached the end of the vignette. We again save our updated Seurat object to disk with a new name, in analogy to what we did in the other vignettes.

saveRDS(seu.s, file = paste0(outFolder,"obj.filt.TFAnalysis.rds"))

Further reading

TF motif clustering: Motif archetypes

Motif archetypes discussion

Kulakovskiy, I.V., Vorontsov, I.E., Yevshin, I.S., Sharipov, R.N., Fedorova, A.D., Rumynskiy, E.I., Medvedeva, Y.A., Magana-Mora, A., Bajic, V.B., Papatsenko, D.A. and Kolpakov, F.A., 2018. HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis. Nucleic acids research, 46(D1), pp.D252-D259.

Session info

It is good practice to print the so-called session info at the end of an R script, which prints all loaded libraries, their versions etc. This can be helpful for reproducibility and recapitulating which package versions have been used to produce the results obtained above.

sessionInfo()
## R version 4.2.0 (2022-04-22)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] ggseqlogo_0.1                      BiocParallel_1.30.0               
##  [3] BSgenome.Mmusculus.UCSC.mm10_1.4.3 BSgenome_1.64.0                   
##  [5] rtracklayer_1.56.0                 Biostrings_2.64.0                 
##  [7] XVector_0.36.0                     GenomicRanges_1.48.0              
##  [9] GenomeInfoDb_1.32.1                IRanges_2.30.0                    
## [11] S4Vectors_0.34.0                   BiocGenerics_0.42.0               
## [13] motifmatchr_1.18.0                 universalmotif_1.14.0             
## [15] TFBSTools_1.34.0                   chromVAR_1.18.0                   
## [17] patchwork_1.1.1                    forcats_0.5.1                     
## [19] stringr_1.4.0                      dplyr_1.0.9                       
## [21] purrr_0.3.4                        readr_2.1.2                       
## [23] tidyr_1.2.0                        tibble_3.1.7                      
## [25] ggplot2_3.3.6                      tidyverse_1.3.1                   
## [27] sp_1.4-7                           SeuratObject_4.1.0                
## [29] Seurat_4.1.1                       Signac_1.6.0                      
## 
## loaded via a namespace (and not attached):
##   [1] SnowballC_0.7.0             scattermore_0.8            
##   [3] R.methodsS3_1.8.1           nabor_0.5.0                
##   [5] bit64_4.0.5                 knitr_1.39                 
##   [7] irlba_2.3.5                 DelayedArray_0.22.0        
##   [9] R.utils_2.11.0              data.table_1.14.2          
##  [11] rpart_4.1.16                KEGGREST_1.36.0            
##  [13] RCurl_1.98-1.6              generics_0.1.2             
##  [15] cowplot_1.1.1               RSQLite_2.2.13             
##  [17] RANN_2.6.1                  future_1.25.0              
##  [19] bit_4.0.4                   tzdb_0.3.0                 
##  [21] spatstat.data_2.2-0         xml2_1.3.3                 
##  [23] lubridate_1.8.0             httpuv_1.6.5               
##  [25] SummarizedExperiment_1.26.1 assertthat_0.2.1           
##  [27] DirichletMultinomial_1.38.0 xfun_0.30                  
##  [29] hms_1.1.1                   jquerylib_0.1.4            
##  [31] evaluate_0.15               promises_1.2.0.1           
##  [33] fansi_1.0.3                 restfulr_0.0.13            
##  [35] caTools_1.18.2              dbplyr_2.1.1               
##  [37] readxl_1.4.0                igraph_1.3.1               
##  [39] DBI_1.1.2                   htmlwidgets_1.5.4          
##  [41] sparsesvd_0.2               spatstat.geom_2.4-0        
##  [43] ellipsis_0.3.2              backports_1.4.1            
##  [45] annotate_1.74.0             deldir_1.0-6               
##  [47] MatrixGenerics_1.8.0        vctrs_0.4.1                
##  [49] Biobase_2.56.0              ROCR_1.0-11                
##  [51] abind_1.4-5                 cachem_1.0.6               
##  [53] withr_2.5.0                 ggforce_0.3.3              
##  [55] progressr_0.10.0            sctransform_0.3.3          
##  [57] GenomicAlignments_1.32.0    goftest_1.2-3              
##  [59] cluster_2.1.3               lazyeval_0.2.2             
##  [61] seqLogo_1.62.0              crayon_1.5.1               
##  [63] pkgconfig_2.0.3             slam_0.1-50                
##  [65] labeling_0.4.2              tweenr_1.0.2               
##  [67] nlme_3.1-157                rlang_1.0.2                
##  [69] globals_0.14.0              lifecycle_1.0.1            
##  [71] miniUI_0.1.1.1              modelr_0.1.8               
##  [73] cellranger_1.1.0            polyclip_1.10-0            
##  [75] matrixStats_0.62.0          lmtest_0.9-40              
##  [77] Matrix_1.4-1                zoo_1.8-10                 
##  [79] reprex_2.0.1                ggridges_0.5.3             
##  [81] png_0.1-7                   viridisLite_0.4.0          
##  [83] rjson_0.2.21                bitops_1.0-7               
##  [85] R.oo_1.24.0                 KernSmooth_2.23-20         
##  [87] blob_1.2.3                  parallelly_1.31.1          
##  [89] spatstat.random_2.2-0       CNEr_1.32.0                
##  [91] scales_1.2.0                memoise_2.0.1              
##  [93] magrittr_2.0.3              plyr_1.8.7                 
##  [95] ica_1.0-2                   zlibbioc_1.42.0            
##  [97] compiler_4.2.0              BiocIO_1.6.0               
##  [99] RColorBrewer_1.1-3          fitdistrplus_1.1-8         
## [101] Rsamtools_2.12.0            cli_3.3.0                  
## [103] listenv_0.8.0               pbapply_1.5-0              
## [105] MASS_7.3-56                 mgcv_1.8-40                
## [107] tidyselect_1.1.2            stringi_1.7.6              
## [109] highr_0.9                   yaml_2.3.5                 
## [111] ggrepel_0.9.1               grid_4.2.0                 
## [113] sass_0.4.1                  fastmatch_1.1-3            
## [115] tools_4.2.0                 future.apply_1.9.0         
## [117] parallel_4.2.0              rstudioapi_0.13            
## [119] TFMPvalue_0.0.8             lsa_0.73.2                 
## [121] gridExtra_2.3               farver_2.1.0               
## [123] Rtsne_0.16                  digest_0.6.29              
## [125] rgeos_0.5-9                 shiny_1.7.1                
## [127] pracma_2.3.8                qlcMatrix_0.9.7            
## [129] Rcpp_1.0.8.3                broom_0.8.0                
## [131] later_1.3.0                 RcppAnnoy_0.0.19           
## [133] httr_1.4.3                  AnnotationDbi_1.58.0       
## [135] colorspace_2.0-3            rvest_1.0.2                
## [137] XML_3.99-0.9                fs_1.5.2                   
## [139] tensor_1.5                  reticulate_1.24            
## [141] splines_4.2.0               uwot_0.1.11                
## [143] RcppRoll_0.3.0              spatstat.utils_2.3-0       
## [145] plotly_4.10.0               xtable_1.8-4               
## [147] jsonlite_1.8.0              poweRlaw_0.70.6            
## [149] R6_2.5.1                    pillar_1.7.0               
## [151] htmltools_0.5.2             mime_0.12                  
## [153] glue_1.6.2                  fastmap_1.1.0              
## [155] DT_0.22                     codetools_0.2-18           
## [157] utf8_1.2.2                  lattice_0.20-45            
## [159] bslib_0.3.1                 spatstat.sparse_2.1-1      
## [161] leiden_0.3.10               gtools_3.9.2               
## [163] GO.db_3.15.0                survival_3.2-13            
## [165] limma_3.52.0                rmarkdown_2.14             
## [167] docopt_0.7.1                munsell_0.5.0              
## [169] GenomeInfoDbData_1.2.8      haven_2.5.0                
## [171] reshape2_1.4.4              gtable_0.3.0               
## [173] spatstat.core_2.4-2